438 research outputs found

    Detecting word substitutions in text

    Get PDF
    Searching for words on a watchlist is one way in which large-scale surveillance of communication can be done, for example in intelligence and counterterrorism settings. One obvious defense is to replace words that might attract attention to a message with other, more innocuous, words. For example, the sentence the attack will be tomorrow" might be altered to the complex will be tomorrow", since 'complex' is a word whose frequency is close to that of 'attack'. Such substitutions are readily detectable by humans since they do not make sense. We address the problem of detecting such substitutions automatically, by looking for discrepancies between words and their contexts, and using only syntactic information. We define a set of measures, each of which is quite weak, but which together produce per-sentence detection rates around 90% with false positive rates around 10%. Rules for combining persentence detection into per-message detection can reduce the false positive and false negative rates for messages to practical levels. We test the approach using sentences from the Enron email and Brown corpora, representing informal and formal text respectively

    Mining Large Data Sets on Grids: Issues and Prospects

    Get PDF
    When data mining and knowledge discovery techniques must be used to analyze large amounts of data, high-performance parallel and distributed computers can help to provide better computational performance and, as a consequence, deeper and more meaningful results. Recently grids, composed of large-scale, geographically distributed platforms working together, have emerged as effective architectures for high-performance decentralized computation. It is natural to consider grids as tools for distributed data-intensive applications such as data mining, but the underlying patterns of computation and data movement in such applications are different from those of more conventional high-performance computation. These differences require a different kind of grid, or at least a grid with significantly different emphases. This paper discusses the main issues, requirements, and design approaches for the implementation of grid-based knowledge discovery systems. Furthermore, some prospects and promising research directions in datacentric and knowledge-discovery oriented grids are outlined

    Wars Without Beginning or End: Violent Political Organizations and Irregular Warfare in the Sahel-Sahara

    Full text link
    This article examines the structure and spatial patterns of violent political organizations in the Sahel-Sahara, a region characterized by growing political instability over the last 20 years. Drawing on a public collection of disaggregated data, the article uses network science to represent alliances and conflicts of 179 organizations that were involved in violent events between 1997 and 2014. To this end, we combine two spectral embedding techniques that have previously been considered separately: one for directed graphs (relationships are asymmetric), and one for signed graphs (relationships are positive or negative). Our result show that groups that are net attackers are indistinguishable at the level of their individual behavior, but clearly separate into pro- and anti-political violence based on the groups to which they are close. The second part of the article maps a series of 389 events related to nine Trans-Saharan Islamist groups between 2004 and 2014. Spatial analysis suggests that cross-border movement has intensified following the establishment of military bases by AQIM in Mali but reveals no evidence of a border sanctuary. Owing to the transnational nature of conflict, the article shows that national management strategies and foreign military interventions have profoundly affected the movement of Islamist groups

    Novel Idea Generation, Collaborative Filtering, and Group Innovation Processes

    Get PDF
    Organizations that innovate encounter challenges due to the complexity and ambiguity of generating and making sense of novel ideas. Exacerbated in group settings, we describe these challenges and propose potential solutions. Specifically, we design group processes to support novel idea generation and selection, including use of a novel-information discovery (NID) tool to support creativity and brainstorming, as well as group support system and collaborative-filtering tools to support evaluation and decision making. Results indicate that the NID tool increases efficiency and effectiveness in creative tasks and that the collaborative-filtering tool can support the decision-making process by focusing the group’s attention on ideas that might otherwise be neglected. Combining these two novel tools with group processes provides valuable contributions to both research and practice

    Relational Autoencoder for Feature Extraction

    Full text link
    Feature extraction becomes increasingly important as data grows high dimensional. Autoencoder as a neural network based feature extraction method achieves great success in generating abstract features of high dimensional data. However, it fails to consider the relationships of data samples which may affect experimental results of using original and new features. In this paper, we propose a Relation Autoencoder model considering both data features and their relationships. We also extend it to work with other major autoencoder models including Sparse Autoencoder, Denoising Autoencoder and Variational Autoencoder. The proposed relational autoencoder models are evaluated on a set of benchmark datasets and the experimental results show that considering data relationships can generate more robust features which achieve lower construction loss and then lower error rate in further classification compared to the other variants of autoencoders.Comment: IJCNN-201

    Inductive Discovery Of Criminal Group Structure Using Spectral Embedding

    Get PDF
    Social network analysis has often been applied to criminal groups to understand their internal structure and dynamics. While the content of communications is often restricted by constitutional and procedural constraints, data about communications is often more readily accessible. This article applies advanced network analysis techniques based on spectral embedding to such traffic data. Spectral embedding facilitates deeper analysis by embedding the graph representing a social network in a geometric space such that Euclidean distance reflects pairwise node dissimilarity. This enables visualizing a network in ways that accurately reflect the structure of the underlying group, and computing properties directly from the embedding. We illustrate spectral approaches for two ‘Ndrangheta drug-smuggling networks, and extend them to a) examine triad structure (through the identification of the Simmelian backbone), which elicits key members, and b) to display temporal properties, which illustrates changing group structure. Although the two groups have the same purpose and come from the same criminal milieu, they have substantially different internal structure which was not detectable using conventional social-network approaches. The techniques presented in this study may support law enforcement in the early stages of an investigation

    Multiplicity Structure of the Hadronic Final State in Diffractive Deep-Inelastic Scattering at HERA

    Get PDF
    The multiplicity structure of the hadronic system X produced in deep-inelastic processes at HERA of the type ep -> eXY, where Y is a hadronic system with mass M_Y< 1.6 GeV and where the squared momentum transfer at the pY vertex, t, is limited to |t|<1 GeV^2, is studied as a function of the invariant mass M_X of the system X. Results are presented on multiplicity distributions and multiplicity moments, rapidity spectra and forward-backward correlations in the centre-of-mass system of X. The data are compared to results in e+e- annihilation, fixed-target lepton-nucleon collisions, hadro-produced diffractive final states and to non-diffractive hadron-hadron collisions. The comparison suggests a production mechanism of virtual photon dissociation which involves a mixture of partonic states and a significant gluon content. The data are well described by a model, based on a QCD-Regge analysis of the diffractive structure function, which assumes a large hard gluonic component of the colourless exchange at low Q^2. A model with soft colour interactions is also successful.Comment: 22 pages, 4 figures, submitted to Eur. Phys. J., error in first submission - omitted bibliograph
    • 

    corecore